Skip to content

Conversation

@lexnv
Copy link
Contributor

@lexnv lexnv commented Jan 9, 2025

This PR provides the partial results of the GetRecord kademlia query.

This significantly improves the authority discovery records, from ~37 minutes to ~2/3 minutes.
In contrast, libp2p discovers authority records in around ~10 minutes.

The authority discovery was slow because litep2p provided the records only after the Kademlia query was completed. A normal Kademlia query completes in around 40 seconds to a few minutes.
In this PR, partial records are provided as soon as they are discovered from the network.

Testing Done

Started a node in Kusama with --validator and litep2p backend.
The node discovered 996/1000 authority records in ~ 1 minute 45 seconds.

Screenshot 2025-01-09 at 12 26 08

Before & After

In this image, on the left side is libp2p, in the middle litep2p without this PR, on the right litep2p with this PR

Screenshot 2025-01-07 at 17 57 56

Closes: #7077

cc @paritytech/networking

@lexnv lexnv added T0-node This PR/Issue is related to the topic “node”. I5-enhancement An additional feature request. labels Jan 9, 2025
@lexnv lexnv self-assigned this Jan 9, 2025
Cargo.toml Outdated
linregress = { version = "0.5.1" }
lite-json = { version = "0.2.0", default-features = false }
litep2p = { version = "0.8.4", features = ["websocket"] }
litep2p = { git = "https://github.com/paritytech/litep2p.git", branch = "lexnv/kad-found-records", features = ["websocket"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do one soon, wanted to double-check first on our Kusama validator that this indeed solves the problem 🙏

@lexnv
Copy link
Contributor Author

lexnv commented Jan 10, 2025

CPU

Litep2p without this PR was running before ~7:15 (spike—left side). On the right side is Litep2p with this PR.

The CPU consumption looks overall stable, with some fluctuations at ~10-minute intervals.
The peak CPU consumption has increased by ~6%, from 171% before to 182% current.

Considering the network's randomness and the ~18x speedup improvement in authority-discovery records, this small tradeoff (if any) appears justified.

Screenshot 2025-01-10 at 12 30 51

Triage Report

The Some network error occurred when fetching erasure chunk warning appears in the first 2 minutes of starting the node. In contrast, before this PR the warning appeared for ~30 minutes.

Repo Count Level Triage report
https://github.com/paritytech/polkadot-sdk/ 437 warn_if_frequent Some network error occurred when fetching erasure chunk
https://github.com/paritytech/polkadot-sdk/ 25 warn Recovery of available data failed.
https://github.com/paritytech/polkadot-sdk/ 25 warn Data unavailable for candidate .*
https://github.com/paritytech/polkadot-sdk/ 12 warn Report .: . to .. Reason: .. Banned, disconnecting. ( A collator provided a collation for the wrong para. Banned, disconnecting.)
https://github.com/paritytech/polkadot-sdk/ 4 warn fetch_pov_job
https://github.com/paritytech/polkadot-sdk/ 3 warn Importing locally an already known assignment
https://github.com/paritytech/polkadot-sdk/ 1 warn Report .: . to .. Reason: .. Banned, disconnecting. ( Same block request multiple times. Banned, disconnecting.)

@lexnv lexnv requested a review from a team January 15, 2025 11:42
@lexnv lexnv enabled auto-merge January 15, 2025 16:18
@lexnv lexnv added this pull request to the merge queue Jan 15, 2025
Merged via the queue into master with commit 77c78e1 Jan 15, 2025
249 of 261 checks passed
@lexnv lexnv deleted the lexnv/kad-found-records branch January 15, 2025 17:27
@paritytech-cmd-bot-polkadot-sdk
Copy link
Contributor

Created backport PR for stable2407:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7099-to-stable2407
git worktree add --checkout .worktree/backport-7099-to-stable2407 backport-7099-to-stable2407
cd .worktree/backport-7099-to-stable2407
git reset --hard HEAD^
git cherry-pick -x 77c78e1561bbe5ee0ecf414312bae82396ae6d11
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk
Copy link
Contributor

Created backport PR for stable2409:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7099-to-stable2409
git worktree add --checkout .worktree/backport-7099-to-stable2409 backport-7099-to-stable2409
cd .worktree/backport-7099-to-stable2409
git reset --hard HEAD^
git cherry-pick -x 77c78e1561bbe5ee0ecf414312bae82396ae6d11
git push --force-with-lease

@paritytech-cmd-bot-polkadot-sdk
Copy link
Contributor

Created backport PR for stable2412:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-7099-to-stable2412
git worktree add --checkout .worktree/backport-7099-to-stable2412 backport-7099-to-stable2412
cd .worktree/backport-7099-to-stable2412
git reset --hard HEAD^
git cherry-pick -x 77c78e1561bbe5ee0ecf414312bae82396ae6d11
git push --force-with-lease

ordian added a commit that referenced this pull request Jan 16, 2025
* master: (33 commits)
  Implement `pallet-asset-rewards` (#3926)
  [pallet-revive] Add host function `to_account_id` (#7091)
  [pallet-revive] Remove revive events (#7164)
  [pallet-revive] Remove debug buffer (#7163)
  litep2p: Provide partial results to speedup GetRecord queries (#7099)
  [pallet-revive] Bump asset-hub westend spec version (#7176)
  Remove 0 as a special case in gas/storage meters (#6890)
  [pallet-revive] Fix `caller_is_root` return value (#7086)
  req-resp/litep2p: Reject inbound requests from banned peers (#7158)
  Add "run to block" tools (#7109)
  Fix reversed error message in DispatchInfo (#7170)
  approval-voting: Make importing of duplicate assignment idempotent (#6971)
  Parachains: Use relay chain slot for velocity measurement (#6825)
  PRDOC: Document `validate: false` (#7117)
  xcm: convert properly assets in xcmpayment apis (#7134)
  CI: Only format umbrella crate during umbrella check (#7139)
  approval-voting: Fix sending of assignments after restart (#6973)
  Retry approval on availability failure if the check is still needed (#6807)
  [pallet-revive-eth-rpc] persist eth transaction hash (#6836)
  litep2p: Sufix litep2p to the identify agent version for visibility (#7133)
  ...
EgorPopelyaev added a commit that referenced this pull request Jan 17, 2025
…ecord queries (#7192)

Backport #7099 into `stable2412` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Co-authored-by: Alexandru Vasile <[email protected]>
Co-authored-by: Alexandru Vasile <[email protected]>
Co-authored-by: Egor_P <[email protected]>
Nathy-bajo pushed a commit to Nathy-bajo/polkadot-sdk that referenced this pull request Jan 21, 2025
…tech#7099)

This PR provides the partial results of the `GetRecord` kademlia query.

This significantly improves the authority discovery records, from ~37
minutes to ~2/3 minutes.
In contrast, libp2p discovers authority records in around ~10 minutes. 

The authority discovery was slow because litep2p provided the records
only after the Kademlia query was completed. A normal Kademlia query
completes in around 40 seconds to a few minutes.
In this PR, partial records are provided as soon as they are discovered
from the network.

### Testing Done

Started a node in Kusama with `--validator` and litep2p backend.
The node discovered 996/1000 authority records in ~ 1 minute 45 seconds.

![Screenshot 2025-01-09 at 12 26
08](https://github.com/user-attachments/assets/b618bf7c-2bba-43a0-a021-4047e854c075)


### Before & After

In this image, on the left side is libp2p, in the middle litep2p without
this PR, on the right litep2p with this PR

![Screenshot 2025-01-07 at 17 57
56](https://github.com/user-attachments/assets/a8d467f7-8dc7-461c-bcff-163b94d01ae8)



Closes: paritytech#7077

cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I5-enhancement An additional feature request. T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

litep2p authority discovery is very slow 5m vs 1h

4 participants